The wide variety of computer-based technologies within the healthcare industry has led to the gathering of electronic data. Due to the massive number of information, medical professionals are faced with the challenge of accurately diagnosing signs and figuring out diseases at an early stage. In medicine, misdiagnosis could be a major factor leading because of poor treatment and diagnosing the disease when it’s serious. However, supervised machine learning techniques have demonstrated the potential to surpass conventional diagnostic procedures and assist medical professionals in diagnosing highrisk diseases. Mostly people feel lazy to visit a hospital, and concern a doctor for a minor complication. However, this small problem can pose significant medical risk. Since, online medical advice is readily available. The system evaluates the symptoms that person give as an input and gives the disease as an output. Naive Bayes Classifier is used in the system. Our system focuses on accuracy, the more numbers of a symptoms furnished by the person as a input the disorder prediction as a output will be better. Work can enhance the health care industry to zenith and give cure to world.
Introduction
Machine learning (ML) algorithms use mathematical and probabilistic techniques to learn from past data and support intelligent decision-making. In healthcare, supervised ML methods are widely applied for disease diagnosis by analyzing large electronic health records, helping doctors detect diseases more accurately and quickly, thus reducing workload and improving patient survival rates.
The proposed solution is an automatic diagnostic system that saves time by allowing users to input symptoms and receive possible disease predictions without visiting a doctor initially. The system tests multiple supervised ML models—including KNN, Naive Bayes (NB), Decision Trees, CNN, SVM, and Logistic Regression—with a focus on the multinomial Naive Bayes algorithm due to its effectiveness with multiple symptom signals.
Implemented in Python using VS Code, the system uses a dataset from the University of Colombia containing 150 diseases and about 810 symptoms. The data is split into 70% training and 30% testing. The algorithm predicts diseases based on symptom input through a user-friendly GUI developed with Tkinter. The system reads symptom data, applies Naive Bayes classification, and outputs likely diseases.
Results show the system accurately predicts diseases based on symptoms, offering a faster and cost-effective alternative to traditional diagnostics. The use of multinomial Naive Bayes on a large dataset achieves high accuracy, and the GUI allows users to easily select symptoms and view predicted diseases.
Conclusion
The project is built so far that the system takes symptoms from the user as input and generates output i.e. disease prediction. The user can select at least one to five symptoms. Less accuracy will be obtained if only one input is selected. The greater the number of input, the greater the accuracy. In this paper we have proposed a learning model for a compact novel machine Algorithm of Naïve bayes. We also tried to reduce the number of features from the dataset. In this process we were able to obtain sufficient accuracy for all data sets using our machine learning model. We found the best accuracy of the most disease approx.(78.6%).
In upcoming Work, the event of complex ML algorithms is extremely necessary for enhancement of disease prediction. In additionally, data sets should be expanded to different demographics to avoid overcrowding and to extend the accuracy of the models used. Finally we’ll try and put all the medical report into this specially last 10-20 medical records in order that everything should be smooth and our system will curtain the burden of medical staff and lots of aspect.
References
[1] M. Marimuthu, M. Abinaya, K. S., K. Madhankumar, and V. Pavithra, “A Review on cardiovascular disease Prediction using Machine Learning and Data Analytics Approach,” International Journal of Computer Applications, vol. 181, no. 18, pp. 20–25, 2018.
[2] Joshi J, Rinal D, Patel J, Diagnosis And Prognosis of Breast Cancer Using Classification Rules, International Journal of Engineering Research and General Science,2(6):315-323, October 2014.
[3] V.Chaurasia and S. Pal, “Data Mining Approach to Detect Heart Diseases”, International Journal of Advanced applied Science and Information Technology (IJACSIT), Vol. 2, No. 4, 2013, Page 56-66.
[4] K. Dwivedi, “Performance evaluation of various machine learning techniques for prediction of heart condition,” Neural Computing and Applications, vol. 29, no. 10, pp. 685– 693, 2018.
[5] SolankiA.V., Data Mining Techniques using WEKA Classification for Sickle Cell Disease, International Journal of Computer Science and Information Technology,5(4): 58575860,2014.
[6] Ahmed F. Otoom, Emad E. Abdallah, Yousef Kilani, Ahmed Kefaye and Mohammad Ashour.(2015) .Effective Diagnosis and Monitoring of Heart Disease. International Journal of Software Engineering and Its Applications. Vol. 9, No. 1, pp. 143-156.
[7] P. P. Sengar, M. J. Gaikwad, and A. S. Nagdive, “Comparative study of machine learning algorithms for breast cancer prediction,” Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, pp. 796–801, 2020.
[8] Chen, Yixue Hao, Kai Hwang , Lu Wang, and Lin Wang(2017). “Disease Prediction by Machine Learning Over Big Data From Healthcare Communities”.IEEE Access. Vol. 5, pp. 8869-8879.
[9] Michael L. Raymer, William F. Punch, Erik D. Goodman, Leslie A. Kuhn, and Anil K. Jain(2000). “Dimensionality Reduction Using Genetic Algorithms”. IEEE Transactions on Evolutionary Computation. Vol. 4, Issue 2, pp. 164-171.
[10] D. Asir Antony Gnana Singh,E.JebamalarLeavline R. Priyanka and P. Padma Priya(2016).“Dimensionality Reduction using Genetic Algorithm for Improving Accuracy in Medical Diagnosis”, I.J. Intelligent Systems and Applications MECS. No. 1, pp.67-73.
[11] S. Jadhav, R. Kasar, N. Lade, M. Patil, and S. Kolte, “Disease Prediction by Machine Learning from Healthcare Communities,” International Journal of Scientific Research in Science and Technology, pp. 29–35, 2019.